Search CORE

74 research outputs found

Assessing Grammatical Correctness in Language Learning

Author: Katinskaia Anisia
Yangarber Roman
Publication venue: The Association for Computational Linguistics
Publication date: 01/04/2021
Field of study

We present experiments on assessing the grammatical correctness of learners’ answers in a language-learning System (references to the System, and the links to the released data and code are withheld for anonymity). In particular, we explore the problem of detecting alternative-correct answers: when more than one inflected form of a lemma fits syntactically and semantically in a given context. We approach the problem with the methods for grammatical error detection (GED), since we hypothesize that models for detecting grammatical mistakes can assess the correctness of potential alternative answers in a learning setting. Due to the paucity of training data, we explore the ability of pre-trained BERT to detect grammatical errors and then fine-tune it using synthetic training data. In this work, we focus on errors in inflection. Our experiments show a. that pre-trained BERT performs worse at detecting grammatical irregularities for Russian than for English; b. that fine-tuned BERT yields promising results on assessing the correctness of grammatical exercises; and c. establish a new benchmark for Russian. To further investigate its performance, we compare fine-tuned BERT with one of the state-of-the-art models for GED (Bell et al., 2019) on our dataset and RULEC-GEC (Rozovskaya and Roth, 2019). We release the manually annotated learner dataset, used for testing, for general use.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Probabilistic Models for Alignment of Etymological Data

Author: Wettig Hannes
Yangarber Roman
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 246-253. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Digital cultural heritage and revitalization of endangered Finno-Ugric languages

Author: Katinskaia Anisia
Yangarber Roman
Publication venue
Publication date: 01/01/2018
Field of study

The preservation of linguistic diversity has long been recognized as a crucial, integral part of supporting our cultural heritage. Yet many “minority” languages—those that lack official state status—are in decline, many severely endangered. We present a prototype system aimed at “heritage” speakers of endangered Finno-Ugric languages. Heritage speakers are people who have heard the language used by the older generations while they were growing up, and who possess a considerable passive competency—well beyond the “beginner” level,—but are lacking in active fluency. Our system is based on natural language processing and artificial intelligence. It assists the learners by allowing them to learn from arbitrary texts of their choice, and by creating exercises that engage them in active production of language—rather than in passive memorization of material. Continuous automatic assessment helps guide the learner toward improved fluency. We believe that providing such AI-based tools will help bring these languages to the forefront of the modern digital age, raise prestige, and encourage the younger generations to become involved in reversal of language decline.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Question Answering and Question Generation for Finnish

Author: Kylliäinen Ilmari
Yangarber Roman
Publication venue
Publication date: 24/11/2022
Field of study

Recent advances in the field of language modeling have improved the state-of-the-art in question answering (QA) and question generation (QG). However, the development of modern neural models, their benchmarks, and datasets for training them has mainly focused on English. Finnish, like many other languages, faces a shortage of large QA/QG model training resources, which has prevented experimenting with state-of-the-art QA/QG fine-tuning methods. We present the first neural QA and QG models that work with Finnish. To train the models, we automatically translate the SQuAD dataset and then use normalization methods to reduce the amount of problematic data created during the translation. Using the synthetic data, together with the Finnish partition of the TyDi-QA dataset, we fine-tune several transformer-based models to both QA and QG and evaluate their performance. To the best of our knowledge, the resulting dataset is the first large-scale QA/QG resource for Finnish. This paper also sets the initial benchmarks for Finnish-language QA and QG

arXiv.org e-Print Archive

DSpace at Tartu University Library

Comparison of Representations of Named Entities for Multi-label Document Classification with Convolutional Neural Networks

Author: Pivovarova Lidia
Yangarber Roman
Publication venue: The Association for Computational Linguistics
Publication date: 01/07/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Revita: a Language-learning Platform at the Intersection of ITS and CALL

Author: Katinskaia Anisia
Nouri Javad
Yangarber Roman
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/05/2018
Field of study

This paper presents Revita, a Web-based platform for language learning—beyond the beginner level. We anchor the presentation in a survey, where we review the literature about recent advances in the fields of computer-aided language learning (CALL) and intelligent tutoring systems (ITS). We outline the established desiderata of CALL and ITS and discuss how Revita addresses (the majority of) the theoretical requirements of CALL and ITS. Finally, we claim that, to the best of our knowledge, Revita is currently the only platform for learning/tutoring beyond the beginner level, that is functional, freely-available and supports multiple languages.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Revita: a System for Language Learning and Supporting Endangered Languages

Author: Katinskaia Anisia
Nouri Javad
Yangarber Roman
Publication venue
Publication date: 11/05/2017
Field of study

We describe a computational system for language learning and supporting endangered languages. The platform provides the user an opportunity to improve her competency through active language use. The platform currently works with several endangered Finno-Ugric languages, as well as with Yakut, and Finnish, Swedish, and Russian. This paper describes the current stage of ongoing development.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Relevance Prediction in Information Extraction using Discourse and Lexical Features

Author: Huttunen Silja
Vihavainen Arto
Yangarber Roman
Publication venue
Publication date: 09/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 114-121. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Multiple Admissibility in Language Learning: : Judging Grammaticality using Unlabeled Data

Author: Ivanova Sardana
Katinskaia Anisia
Yangarber Roman
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2019
Field of study

We present our work on the problem of detection Multiple Admissibility (MA) in language learning. Multiple Admissibility occurs when more than one grammatical form of a word fits syntactically and semantically in a given context. In second-language education—in particular, in intelligent tutoring systems/computer-aided language learning (ITS/CALL), systems generate exercises automatically. MA implies that multiple alternative answers are possible. We treat the problem as a grammaticality judgement task. We train a neural network with an objective to label sentences as grammatical or ungrammatical, using a "simulated learner corpus": a dataset with correct text and with artificial errors, generated automatically. While MA occurs commonly in many languages, this paper focuses on learning Russian. We present a detailed classification of the types of constructions in Russian, in which MA is possible, and evaluate the model using a test set built from answers provided by users of the Revita language learning system.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Benchmarks and models for entity-oriented polarity detection

Author: Klami Arto
Pivovarova Lidia
Yangarber Roman
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto